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ABSTRACT: 



The present invention relates to a data processing unit comprising a register file, a 
register load and store buffer connected to the register file, a single memory, and a 
bus having at least first and second word lines to form a double word wide bus 
coupling the register load and store buffer with said single memory. The register file 
at least two sets of registers whereby the first set of registers can be coupled with 
one of the word lines and the second set of registers can be coupled with the 
respective other word lines, a load and store control unit for transferring data from 
or to the memory. 

24 Claims, 9 Drawing figures 
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Abstract Text (1) : 

The present invention relates to a data processing unit comprising a register file, a 
register load and store buffer connected to the register file, a single memory, and a 
bus having at least first and second word lines to form a double word wide bus 
coupling the register load and store buffer with said single memory. The register file 
at least two sets of registers whereby the first set of registers can be coupled with 
one of the word lines and the second set of registers can be coupled with the 
respective other word lines, a load and store control unit for transferring data from 
or to the memory. 

INVENTOR (1) : 
Fleck ; Rod G. 

Brief Summary Text (8) : 

This object is accomplished by a Data processing unit comprising a register file, a 
register load and store buffer connected to the register file, a single memory, and a 
bus having at least first and second word lines to form a double word wide bus 
coupling the register load and store buffer with said single memory. The register file 
at least two sets of registers whereby the first set of registers can be coupled with 
one of the word lines and the second set of registers can be coupled with the 
respective other word lines, a load and store control unit for transferring data from 
or to the memory. 

Brief Summary Text (9) : 

In one embodiment, the load and store control unit has means to load or store two 
consecutive words in parallel from or to said memory to or from the first and second 
set of registers. In another embodiment, one word from the memory can be split into 
two half-words which are then stored in a first register from the first set of 
registers and in a second register from the second set of registers. The half-words 
can be stored into one half of a register and the other half of the register can be 
filled up with zeros or sign-filled. 

Brief Summary Text (10) : 

In a further embodiment the bus has a plurality of word lines to form a plurality-word 
wide bus and the register file has a plurality of sets of registers whereby each set 
of registers is coupled with one of word lines of said plurality of word lines. For 
example, in a 64 bit data processing unit, two 32 bit half-words or four 16 
quarter-words can be accessed during one single cycle. The load and store control unit 
of the data processing can therefore have means to load or store a plurality of 
consecutive words in parallel from or to said memory to or from said plurality of sets 
of registers. These means allow to couple any register of any set of registers with 
any location within the memory. 

Brief Summary Text (11) : 

In a further embodiment the load and store control unit of the data processing unit 
can have means to load one word from said memory and to split it into a plurality of 
partial -words , each partial word is stored in one of said registers of each set of 
registers, respectively. 

Detailed Description Text (3) : 
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For coupling the register file 2 with the memory 1, a buf f er/select logic 2a is 
provided. In this embodiment, numeral 2b indicates the registers. 16 registers DO to 
D15 are provided, whereby each register has a bit width of a word which has, for 
example, 32 bits. The registers are organized in two groups, even and odd registers. 
The registers in this example are data register but can be either address or data 
registers. A second set of registers can be provided in the same way for address 
registers. The bus between the memory unit 1 and the buffer/select logic 2a is 64 bits 
wide thereby two consecutive words in the memory 1 can be addressed. A load /store 
control unit 2d addresses the memory unit 1 and selects the respective registers 2b 
during a transfer from the register file 2 to the memory unit 1 or vice versa. The 
register file 2 comprises furthermore a second buffer/select logic 2c coupling a 
plurality of execution units 4, 5, and 6 thereto. A second bus 3 is provided as a link 
between the buffer/select logic 2c and the execution units 4, 5, and 6. Through the 
respective buffer/select logic 2a or 2c at least two registers, one in each group, for 
example, an even and an odd register, can be accessed at the same time. 

Detailed Description Text (6) : 

The special instructions provide a " load double word to a register " -instruction . The 
double word is loaded from the memory to the multiplexer units 8 and 9 through the 
data output lines la and Id. In this mode units 12 and 13 operate as multiplexers 
coupling the data lines la with the odd registers or with even registers and the data 
lines Id with the even registers or the odd registers, respectively. The data 
processing unit can have a special selecting unit allowing to select in this 
instruction any register in each group. A simplified embodiment selects only one 
register and the second register is automatically the register adjacent to the 
selected one. For example, if the even register D4 is the selected, the adjacent odd 
register would be register D5 or if the odd register D7 would be selected, the 
adjacent even register would be D6 . The double word in the memory can be located at 
aligned addresses, for example word le, and consecutive word If, or it can be accessed 
at unaligned addresses, such as word If and consecutive word lg. The multiplexer 7, 
8,9, and 10 align the respective data and distribute them to the respective registers 
or memory cells. 

Detailed Description Text (8) : 

A second type of instruction which can be executed according to the present invention 
is a so called " load two half-words ( packed ) "-instruction. With this instruction one 
word from either data lines la or Id is loaded and split into half-words by units 8 or 
9 placed in the respective lower halves of a word. Optionally units 12 and 13 can 
either sign-extend or zero-extend the respective half-words to words. In other words, 
in this embodiment, the 16 bit half-words are extended to 32 bits. Unit 8 or unit 9 
splits the word received from lines la or Id into two half-words and distributes them 
through units 12 and 13 to the lower halves of the respective even and odd registers. 
In units 12 and 13 these half-words can be extended to words either by filling the 
upper halves with zeros or by sign extending the upper halves . If the sign of a 
half-word is negative the upper halves of the respective register is filled up with 
"1" otherwise with "0". If units 12 and 13 are deactivated the half-words are stored 
into the lower halves of the respective even and odd registers without changing their 
upper halves. In a simplified version the least significant memory half-word is always 
stored into an even register and the most significant half-word is stored into an odd 
register adjacent to the even register. 

Detailed Description Text (9) : 

A third type of instruction which can be executed according to the present invention 
is a so called " load two signed fractions" -instruction. With this instruction one word 
from either data lines la or Id is loaded and split into half-words by units 8 or 9 
placed in the upper halves of a respective word. Optionally units 12 and 13 can 
zero-extend the respective half-words to words. Unit 8 or unit 9 splits the word 
received from lines la or Id into two half-words representing the upper and lower half 
of the word and distributes them through units 12 and 13 to the upper halves of the 
respective even and odd registers. In units 12 and 13 these half-words can be extended 
to words by filling the lower halves with "0". If units 12 and 13 are deactivated the 
half-words are stored into the upper halves of the respective even and odd registers 
without changing their lower halves. In a simplified version the least significant 
memory half-word is always stored into an even register and the most significant 
half-word is stored into an odd register adjacent to the even register. 
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Detailed Description Text (10) : 

A fourth type of instruction which can be executed according to the present invention 
is a so called "store two half-words ( packed ) "-instruction. With this instruction the 
lower half-words of an even and an odd register are fed to either concatenating unit 
11 or 14. The two half-words are combined to one word and the stored in the memory 
unit 1 through multiplexer 7 or 10 and either data input lines lb or lc. 

Detailed Description Text (12) : 

Finally a sixth type of instruction which can be executed according to the present 
invention is a so called "store double word from data registers" -instruction . With 
this instruction the content of an even and an odd register are fed to either 
multiplexer units 7 or 10 and stored in the memory unit through data input lines lb 
and lc . This instruction works in the same way as a " load double word to a 
register" -instruction described above. Units 7 and 10 operate as multiplexers 
distributing the content of each register to either data input lines lb or lc. Units 
11 and 14 are deactivated so that units 7 and 10 each receive the full word stored in 
an even or odd register at their inputs. 

Detailed Description Text (13) : 

This principle of arranging the memory and the register file can be easily extended. 
For example, four different sets of register can be provided and the addressing of the 
memory can be extended by a four word wide bus, allowing to load and store four 
consecutive words at a time. 

Detailed Description Text (16) : 

FIG. 4 shows a second type of packed arithmetic or logical operations. Three registers 
20, 25 and 2 6 is divided into four parts. In this embodiment, each part contains 8 
bit. Four operator units 21, 22, 23, and 24 are provided and associated to each 8 bit 
part of registers 20, 25 and 26. The four parts of registers 20 and 25 provide the 
input values for each operator unit 21, 22, 23, and 24, whereas the output signals of 
each operator unit 21, 22, 23, and 24 are fed to the respective parts of register 26. 

Detailed Description Text (19) : 

This so called packed arithmetic or logical instructions partition, in this 
embodiment, a 32 bit word into several identical objects, which can then be fetched, 
stored, and operated on in parallel. These instructions, in particular, allow the full 
exploitation of the 32 bit word of the data processing unit according to the present 
invention in DSP applications. 

Detailed Description Text (20) : 

In this embodiment two packed formats can be implemented. The first format divides the 
32 bit word into two 16 bit half-word values. The second packed format divides the 32 
bit word into four 8 bit (byte) values. 

Detailed Description Text (21) : 

The loading and storing of packed values into data or address registers is supported 
by the respective load and store instructions described above. The packed objects can 
then be manipulated in parallel by a set of special packed arithmetic instructions 
that perform such arithmetic operations as addition, subtraction, multiplication, 
division, etc. For example a multiply instruction performs two, 16 bit 
multiplication's in parallel as shown in FIG. 5. 

Detailed Description Text (23) : 

Addition is performed on individual packed bytes or half-words using the respective 
addition instructions and they can be extended by a saturation unit 44 which ignores 
overflow or underflow within individual bytes or half-words. The saturation unit 44 
provides each addition with a function that saturates individual bytes or half-words 
to the most positive value on individual overflow or to the most negative value on 
individual underflow. For example, compare unit 41 can compare the content of result 
register 42 with a predefined saturation value. If the content is greater than a 
predefined positive/negative saturation value, this is indicated to saturation unit 44 
and saturation unit 44 sets the content of result register 42 to the respective 
positive or negative saturation value. Saturation can be provided to a variety of 
arithmetic instructions. 
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Detailed Description Text (26) : 

A circular buffer control unit 32 is coupled with these registers 31a, 31b, and 31c. A 
load /store control unit for the circular buffer 33 is coupled with this control unit 
32 and with the memory 1 and the register file. It also has access to the buffer 
storing means 31. The instruction execution unit of the CPU is indicated by numeral 34 
and receives certain control inputs as will be explained later. 

Detailed Description Text (28) : 

FIG. 6 shows such a circular buffer consisting of memory cells bl, b2, . . . b8. If 
the circular buffer control unit starts accessing the buffer beginning with a starting 
index of "0", the first two cells bl and b2 and the consecutive cells are accessed 
aligned, no further control action is necessary. If a starting index of, for example 
"1" is used, or the offset is an odd number a double word access beginning at word b8 
must access word bl as the second word. As word bl is not consecutively stored in 
regard to word b8, load /store control unit 33 issues a second instruction into the 
instruction execution unit 34 to access word b8 during a first cycle and word bl 
during the following cycle. Only in this case two access cycles are necessary to load 
or store data which cross the boundary of the circular buffer. As circular buffers are 
usually large such accesses are very rare compared to "normal" non -boundary -crossing 
access . 

Detailed Description Text (33) : 

FIG. 9 shows a block diagram showing an example of a configuration of a data handling 
unit according to the present invention to perform a FIR filter function. A memory 1 
contains Data 0 to Data N-l and coefficients COE 0 to -COE N-l. The memory is addressed 
by the address register file 45 which contain respective pointers and which is coupled 
with a load /store address arithmetic. The memory 1 is also connected through a 64 bit 
bus with the data register file 2 containing actual coefficients and data which are 
calculated. The data processing unit comprises a plurality of buses 47, 48, 49 and 50 
which handle the different data for execution in the different arithmetic units. Two 
multipliers 51 and 52 are provided to execute two multiplication's in parallel whose 
inputs are coupled with the data register file through bus 47. Furthermore two 16 bit 
adders 53 and 54 are provided which are coupled through bus 50 with the results of the 
multipliers 51 and 52. Bus 48 is coupled to the outputs of adders 53 and 54. Two 
additional adders 55 and 56 are provided whose inputs are coupled with bus 4 8 and 
whose outputs are coupled to bus 49. Bus 47 and therefore data register file 2 is 
coupled through several lines with busses 48 and 49. Bus 50 and bus 49 are 
additionally coupled with bus 48. 

CLAIMS : 

1. Data processing unit comprising: 

a register file with a plurality of word-wide registers, whereby a word having a 
predefined number of bits, 

a register load and store buffer connected to said register file, 



a bus having at least first and second word lines to form a double word wide bus 
coupling said register load and store buffer with said memory, whereby said register 
file has at least two sets of registers, 

coupling means, so that said first set of registers can be coupled with one of said 
word lines and said second set of registers can be coupled with the respective other 
word lines, 

a load and store control unit for transferring data from or to said memory, wherein 
said load and store control unit is configured to, in response to a single instruction 
for the data processing unit, load one word from said memory and to split it into two 
half-words which are stored in one half of a first register from said first set of 
registers and in a corresponding half of a second register from said second set of 
registers, respectively. 



a memory, 
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2. Data processing unit according to claim 1, wherein said load and store control unit 
has means to load a first half-word from a first register of said first set of 
registers and a second half-word from a second register from said second set of 
registers and to concatenate both half-words to a single word and to store said word 
in said memory via said data bus . 

14. Data processing unit according to claim 1, wherein said load and store control 
unit has means to load or store two consecutive words in parallel from or to said 
memory to or from said first and second set of registers. 

15. Data processing unit according to claim 1, wherein said load and store control 
unit has means to load or store two consecutive words in parallel from or to said 
memory to or from said first and second set of registers. 

16. Data processing unit according to claim 1, wherein said one half of said first 
register is the lower half of said first register, whereby said corresponding half of 
said second register is therefore the lower half of said second register, and wherein 
said load and store control unit is further configured to sign fill the upper half of 
each of said first and second registers in response to said single instruction. 

17. Data processing unit according to claim 1, wherein said load and store control 
unit is configured to fill the other half of each of said first and second registers 
with zeros. 

18. Data processing unit comprising: 

a register file with a plurality of word-wide registers, whereby a word having a 
predefined number of bits, 

a register load and store buffer connected to said register file, 



a bus having at least first and second word lines to form a double word wide bus 
coupling said register load and store buffer with said memory, whereby said register 
file has at least two sets of registers, 

coupling means, so that said first set of registers can be coupled with one of said 
word lines and said second set of registers can be coupled with the respective other 
word lines, 

a load and store control unit for transferring data from or to said memory, wherein 
said load and store control unit has means to load one word from said memory and to 
split it into two half-words which are stored in a first register from said first set 
of registers and in a second register from said second set of registers, wherein said 
load and store control unit further comprises means to load said half-words into a 
lower half of a register and to sign fill the upper half of said register. 

19. Data processing unit comprising: 

a register file with a plurality of word-wide registers whereby a word having a 
predefined number of bits, 

a register load and store buffer connected to said register file, 



a bus having at least first and second word lines to form a double word wide bus 
coupling said register load and store buffer with said memory, whereby said register 
file has at least two sets of registers, 

coupling means, so that said first set of registers can be coupled with one of said 
word lines and said second set of registers can be coupled with the respective other 
word lines, 
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a load and store control unit for transferring data from or to said memory, wherein 
said load and store control unit has means to load one word from said memory and to 
split it into a plurality of partial-words, each partial word is stored in one of said 
registers of each set of registers, respectively, wherein said load and store control 
unit further comprises means to load said partial-words into one part of a register 
and to fill the remaining part of said register with zeros. 

20. Data processing unit comprising: 

a register file comprising a plurality of sets of word-wide registers, wherein a word 
has a predefined number of bits, 

a register load and store buffer coupled to said register file, 



a bus comprising a plurality of word lines to form an at least double word-wide bus 
coupling said register load and store buffer with said memory, 

a logic configured to couple a first set of registers with one of said plurality of 
word lines and to couple a second set of registers with another of said plurality of 
word lines, 

a load and store control unit configured to transfer data from or to said memory, 
wherein said load and store control unit is further configured to load one word from 
said memory, separate said one word into a plurality of partial words, and store said 
partial words into a plurality of said word-wide registers, each of said plurality of 
said word-wide registers storing no more than one of said partial words, and said 
partial words each stored at a same positional portion within its respective word-wide 
register, whereby gaps are created in said respective word-wide registers, the gaps 
being portions of said respective word-wide registers other than said same positional 
portion . 

21. Data processing unit according to claim 20, wherein said load and store control 
unit is configured to execute a single instruction, for the data processing unit, that 
instructs the data processing unit to load said one word from said memory, separate 
said one word into said plurality of partial words, and store said partial words into 
said plurality of said word-wide registers. 

22. Data processing unit according to claim 20, wherein said same positional portion 
of any word-wide register is a lower portion of said any word-wide register, and said 
load and store control unit is configured to sign fill an upper portion of each of 
said respective word-wide registers. 

23. Data processing unit according to claim 20, wherein said load and store control 
unit is configured to zero fill said gaps. 

24. Data processing unit comprising: 

a register file comprising a plurality of sets of registers, each register being at 
least word wide, wherein a word has a predefined number of bits, 

a register load and store buffer coupled to said register file, 



a bus comprising a plurality of word lines to form an at least double word-wide bus 
coupling said register load and store buffer with said memory, 

a logic configured to couple a first set of registers with one of said plurality of 
word lines and to couple a second set of registers with another of said plurality of 
word lines, 

a load and store control unit configured to transfer data from or to said memory, 
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wherein said load and store control unit is further configured execute an instruction 
that instructs the data processing unit to load one word from said memory, separate 
said one word into a plurality of partial words, and store said partial words into a 
plurality of said registers wherein said instruction also instructs the data 
processing unit to zero fill or sign fill. 
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CHG DATE=19990905 STATUS=0> The present invention is related to a data processing unit 
having a set of data registers and a set of address registers. Each register has a 
width of n bits. Furthermore, there are provided address load and store buffers 
associated with said address registers, data load and store buffers associated with 
said data registers and a bus having a plurality of bus lines being connected to said 
store buffers. A data memory unit is connected to said bus. The data registers are 
arranged in such a way that at least n data registers are connected in parallel to 
respective bus lines, n being greater than 1, and the address registers are arranged 
* in such a way, that at least m address registers are coupled in parallel to respective 
bus lines, m being greater than 1. Thus, at least four registers can be accessed in 
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CHG DATE=19990905 STATUS=C>The present invention relates to a data processing unit 
comprising a register file, a register load and store buffer connected to the register 
file, a single memory, and a bus having at least first and second word lines to form a 
double word wide bus coupling the register load and store buffer with said single 
memory. The register file has at least two sets of registers whereby the first set of 
registers can be coupled with one of the word lines and the second set of registers 
can be coupled with the respective other word lines, a load and store control unit for 
transferring data from or to the memory. 
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ART-UNIT: 277 

PRIMARY -EXAMINER: Mai; Tan V. 

ATT Y- AGENT -FIRM: Blakely, Sokoloff, Taylor & Zafman 
ABSTRACT : 

The invention provides a method and apparatus for performing complex digital filters. 
According to one aspect of the invention, a method for performing a complex digital 
filter is described. The complex digital filter is performed using a set of data 
samples and a set of complex coefficients. In addition, the complex digital filter is 
performed using a inner and outer loop. The outer loop steps through a number of 
corresponding relationships between the set of complex coefficients and the set of 
data samples. The inner loop steps thorough each complex coefficient in the set of 
complex coefficients. Within the inner loop, the data sample corresponding to the 
current complex coefficient (the complex coefficient currently identified by the inner 
loop) is determined according to the current corresponding relationship (the 
corresponding relationship currently identified by the outer loop) . Then, in response 
to receiving an instruction, eight data elements are read and used to generate a 
currently calculated complex number. These eight data elements were previously stored 
as packed data and include two representations of each of the components of the 
current complex coefficient and its current corresponding data sample. Each of these 
data elements is either the positive or negative of the component they represent. As a 
result of the manner in which these eight data elements are stored, the currently 
calculated complex number represents the product of the current complex coefficient 
and its current corresponding data sample. The currently calculated complex number is 
then added to the current output packed data. 

14 Claims, 16 Drawing figures 



3 of 3 



12/4/03 5:15 PM 



Record Display Form 




http://westbrs: 8002^r^gate.exe?f^doc&s. . .e=&p_Message=&p_doccnt= 1 &p_doc_ 1 =PTFK WIC 





□ 



Generate Collection 



Print 



L9: Entry 13 of 3 6 



File: USPT 



May 22, 2001 



DOCUMENT- IDENTIFIER: US 6237016 Bl 

** See image for Certificate of Correction ** 

TITLE: Method and apparatus for multiplying and accumulating data samples and complex 
coefficients 



Detailed Description Text (9) : 

The decode unit 140 is shown including packed data instruction set 145 for performing 
operations on packed data. In one embodiment, the packed data instruction set 145 
includes the following instructions: a packed multiply-add instruction (s) (PMADD) 150, 
a pack instruction (s) (PACK) 155, an unpack / interleave instruction (s) (PUNPCK) 160, a 
packed shift instruction (s) 165, an PXOR instruction (s) (PXOR) 170, a packed add 
instruction (s) (PADD) 175, a packed subtract instructions) (PSUB) 180, and a move 
instruction (s) 185. The operation of each of these instructions is further described 
herein. While these packed data instructions can be implemented to perform any number 
of different operations, in one embodiment these packed data instructions are those 
described in "A Set of Instructions for Operating on Packed Data," filed on Aug. 31, 
1995, Ser. No. 08/521,360. Furthermore, in one embodiment, the processor 105 is a 
pipelined processor (e.g., the Pentium processor) capable of completing one or more of 
these packed data instructions per clock cycle (ignoring any data dependencies and 
pipeline freezes) . In addition to the packed data instructions, processor 105 can 
include new instructions and/or instructions similar to or the same as those found in 
existing general purpose processors. For example, in one embodiment the processor 105 
supports an instruction set which is compatible with the Intel Architecture 
instruction set used by existing processors, such as the Pentium processor. 
Alternative embodiments of the invention may contain more or less, as well as 
different, packed data instructions and still utilize the teachings of the invention. 

Detailed Description Text (31) : 

FIG. 5 illustrates the operation of the unpack instruction according to one embodiment 
of the invention. In one embodiment, the unpack instruction interleaves the low-order 
data elements from a first operand 510 and a second operand 520. The numbers inside 
each packed data item identifies the data elements for purposes of illustration. Thus, 
data element 0 of the first operand 510 is stored as data element 0 of a result 530. 
Data element 0 of the second operand 520 is stored as data element 1 of the result 
530. Data element 1 of the first operand 510 is stored as data element 2 of the result 
530 and so forth, until all data elements of the result 530 store data elements from 
either the first operand 510 or the second operand 520. The high-order data elements 
of both the first and second operand are ignored. By choosing either the first operand 
510 or the second operand 520 to be all zeroes, the unpack may be used to unpack 
packed byte data elements into packed word data elements, or to unpack packed word 
data elements into packed dword data elements, etc. In an alternate embodiment, the 
high-order bytes of each packed data item are interleaved into the result. 

US Reference Patent Number (40) : 
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ART-UNIT: 278 

PRIMARY -EXAMINER: Maung; Zarni 

ATT Y- AGENT -FIRM: Conley, Rose & Tayon Kowert; Robert C. Daffer; Kevin L. 



A multimedia extension unit (MEU) is provided for performing various multimedia- type 
operations. The MEU can be coupled either through a coprocessor bus or a local CPU bus 
to a conventional processor. The MEU employs vector registers, a vector ALU, and an 
operand routing unit (ORU) to perform a maximum number of the multimedia operations 
within as few instruction cycles as possible. Complex algorithms are readily performed 
by arranging operands upon the vector ALU in accordance with the desired algorithm 
flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the 
vector registers using vector instructions unique to the MEU. At the output of the 
ORU, operand pairs from vector source or destination registers can be easily routed 
and combined at the vector ALU. The vector instructions employ special load/store 
instructions in combination with numerous operational instructions to carry out 
concurrent multimedia operations on the aligned operands. 

12 Claims, 19 Drawing figures 
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a destination register 

Detailed Description Text (113) : 

FIG. 17 illustrates the vldb vd, meml28 and vstb mem64 , vsh load / store instructions 
wherein 16 byte load and store operations occur in a 2 : 1 byte interleave pattern. A 
10-bit load from memory address .alpha, maps the lower half of each slot s (i.e., 
lower half sub- slot) to the memory byte at address . alpha. +s; and it maps the upper 
half of each slot (i.e., upper half sub-slot) to the memory byte at address 
. alpha. +s+8. As a result, the MEU performs independent but identical operations on two 
sets of data that reside in two adjacent 8 byte octets of memory. 

Detailed Description Text (116) : 

The interleave mapping for 10 -bit partitions is completely transparent to the 
programmer as long as only 10 -bit loads/stores and vector instructions are performed 
on a given set of data. Interleaved mapping of 2 0 -bit partitions is also transparent 
to the programmer if only 20-bit operations are performed. However, if 10 -bit and 
2 0 -bit operations are mixed, then care must be taken to understand the mapping so that 
the expected results are produced. The interleaving can be very useful, for example, 
if a 10 -bit load from an octet -sized memory location automatically expands and 
interleaves the byte -wide memory data to the upper portion of 20-bit partitions. The 
2 0 -bit operation can be immediately performed on this data without the need for 
explicit format conversions. Subsequently, 10-bit stores to octets can automatically 
perform the inverse 20-bit to 10-bit packing function. Thus, the present store 
operation, namely vstb mem64, vsh performs packing of n+4 bits within a slot of a 
vector register to n/2 bits within an address of the memory unit. Given n=16, 
2 0-bit-to-8-bit packing can occur as part of the store operation. Additional 
operations, such as move or shift operations need not occur to perform a packing 
function. Packing serves to store the most significant bits from a slot. Unpacking is 
an operation by which n/2 bits from a memory address are loaded into n+4 bit locations 
within a slot. If n=16, then a load operation such as vldb vdh, mem64 causes 8 -bits 
within a memory address to be loaded into a 20-bit slot. Utilizing load and store 
functions in such a manner thereby avoids having to implement separate unpack and pack 
instructions, respectively, within the MEU instruction set. Accordingly, the same 
result can be achieved but with fewer instructions. For MPEG, 8-bit pixels are 
unpacked to 20-bit numbers for DCT or IDCT manipulations, then the results are 
repacked to 8-bit pixels. The internals of the DCT and IDCT operations require more 
than 8 bits of precision, to which packing and unpacking are particularly 
advantageous . 

Detailed Description Text (119) : 

Movement of data not only between slots, but between sub-slots is particularly helpful 
when performing MPEG motion compensation on 8 -bit pixel values. In the example shown 
above, a single load instruction which causes interleaving of 16 -bytes, followed by 
four move and four sub-slot routing instructions performs the same function but in a 
more efficient manner than doing unaligned memory references. Thus, MPEG motion 
compensation on a 1. times. 8 block is advantageously performed by a single interleaving 
load operation, followed by a single vector instruction containing three move 
operations (mov) and five sub- slot swapping operations (blbh) across five slot 
midpoints . 

Detailed Description Paragraph Table (7) : 
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;16 video bytes are in data in memory (the MSB, 

A, is shown on left) : ; ABCD EFGH IJKL MNOP ;need to extract 8 unaligned bytes from 
center; FCHI JKLM ; load 16 bytes into register vO ( load does interleaving ) vldb vO, 
byte ptr [esi] ;esi points to byte "P" ;now vO contains AIBJ CKDL EM FN GOHP ;in slots: 
7766 5544 3322 1100 ;use 20-bit routing ops to move data across 10-bit routing barrier 
{mov mov mov blbh blbh blbh blbh blbh} word vO, vO, vO (21076543) ;now vO contains FNGO 
HPIA JBKC LDME = FxGx Hxlx JxKx LxMx ; store 8 bytes into memory vstb byte ptr [edi] , 
vOh ;*[edi] contains FGHI JKLM 
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ABSTRACT : 
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coupling the register load and store buffer with said single memory. The register file 
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one of the word lines and the second set of registers can be coupled with the 
respective other word lines, a load and store control unit for transferring data from 
or to the memory. 
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